10 research outputs found

    Effects of intermittent faults on the reliability of a Reduced Instruction Set Computing (RISC) microprocessor

    Full text link
    © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.With the scaling of complementary metal-oxide-semiconductor (CMOS) technology to the submicron range, designers have to deal with a growing number and variety of fault types. In this way, intermittent faults are gaining importance in modern very large scale integration (VLSI) circuits. The presence of these faults is increasing due to the complexity of manufacturing processes (which produce residues and parameter variations), together with special aging mechanisms. This work presents a case study of the impact of intermittent faults on the behavior of a reduced instruction set computing (RISC) microprocessor. We have carried out an exhaustive reliability assessment by using very-high-speed-integrated-circuit hardware description language (VHDL)-based fault injection. In this way, we have been able to modify different intermittent fault parameters, to select various targets, and even, to compare the impact of intermittent faults with those induced by transient and permanent faults.This work was supported by the Spanish Government under the Research Project TIN2009-13825 and by the Universitat Politecnica de Valencia under the Project SP20120806. Associate Editor: L. Cui.Gracia-Morán, J.; Baraza Calvo, JC.; Gil Tomás, DA.; Saiz-Adalid, L.; Gil, P. (2014). Effects of intermittent faults on the reliability of a Reduced Instruction Set Computing (RISC) microprocessor. IEEE Transactions on Reliability. 63(1):144-153. https://doi.org/10.1109/TR.2014.2299711S14415363

    Reducing the Overhead of BCH Codes: New Double Error Correction Codes

    Full text link
    [EN] The Bose-Chaudhuri-Hocquenghem (BCH) codes are a well-known class of powerful error correction cyclic codes. BCH codes can correct multiple errors with minimal redundancy. Primitive BCH codes only exist for some word lengths, which do not frequently match those employed in digital systems. This paper focuses on double error correction (DEC) codes for word lengths that are in powers of two (8, 16, 32, and 64), which are commonly used in memories. We also focus on hardware implementations of the encoder and decoder circuits for very fast operations. This work proposes new low redundancy and reduced overhead (LRRO) DEC codes, with the same redundancy as the equivalent BCH DEC codes, but whose encoder, and decoder circuits present a lower overhead (in terms of propagation delay, silicon area usage and power consumption). We used a methodology to search parity check matrices, based on error patterns, in order to design the new codes. We implemented and synthesized them, and compared their results with those obtained for the BCH codes. Our implementation of the decoder circuits achieved reductions between 2.8% and 8.7% in the propagation delay, between 1.3% and 3.0% in the silicon area, and between 15.7% and 26.9% in the power consumption. Therefore, we propose LRRO codes as an alternative for protecting information against multiple errors.This research was supported in part by the Spanish Government, project TIN2016-81075-R, by Primeros Proyectos de Investigacion (PAID-06-18), Vicerrectorado de Investigacion, Innovacion y Transferencia de la Universitat Politecnica de Valencia (UPV), project 20190032, and by the Institute of Information and Communication Technologies (ITACA).Saiz-Adalid, L.; Gracia-Morán, J.; Gil Tomás, DA.; Baraza Calvo, JC.; Gil, P. (2020). Reducing the Overhead of BCH Codes: New Double Error Correction Codes. Electronics. 9(11):1-14. https://doi.org/10.3390/electronics9111897S114911Fujiwara, E. (2005). Code Design for Dependable Systems. doi:10.1002/0471792748Xinmiao, Z. (2017). VLSI Architectures for Modern Error-Correcting Codes. doi:10.1201/b18673Bose, R. C., & Ray-Chaudhuri, D. K. (1960). On a class of error correcting binary group codes. Information and Control, 3(1), 68-79. doi:10.1016/s0019-9958(60)90287-4Chen, P., Zhang, C., Jiang, H., Wang, Z., & Yue, S. (2015). High performance low complexity BCH error correction circuit for SSD controllers. 2015 IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC). doi:10.1109/edssc.2015.7285089IEEE 802.3-2018 - IEEE Standard for Ethernethttps://standards.ieee.org/standard/802_3-2018.htmlH.263: Video Coding for Low Bit Rate Communicationhttps://www.itu.int/rec/T-REC-H.263/enVangelista, L., Benvenuto, N., Tomasin, S., Nokes, C., Stott, J., Filippi, A., … Morello, A. (2009). Key technologies for next-generation terrestrial digital television standard DVB-T2. IEEE Communications Magazine, 47(10), 146-153. doi:10.1109/mcom.2009.52738222013 ITRS—International Technology Roadmap for Semiconductorshttp://www.itrs2.net/2013-itrs.htmlIbe, E., Taniguchi, H., Yahagi, Y., Shimbo, K., & Toba, T. (2010). Impact of Scaling on Neutron-Induced Soft Error in SRAMs From a 250 nm to a 22 nm Design Rule. IEEE Transactions on Electron Devices, 57(7), 1527-1538. doi:10.1109/ted.2010.2047907Gil-Tomás, D., Gracia-Morán, J., Baraza-Calvo, J.-C., Saiz-Adalid, L.-J., & Gil-Vicente, P.-J. (2012). Studying the effects of intermittent faults on a microcontroller. Microelectronics Reliability, 52(11), 2837-2846. doi:10.1016/j.microrel.2012.06.004Neubauer, A., Freudenberger, J., & Khn, V. (2007). Coding Theory. doi:10.1002/9780470519837Morelos-Zaragoza, R. H. (2006). The Art of Error Correcting Coding. doi:10.1002/0470035706Naseer, R., & Draper, J. (2008). DEC ECC design to improve memory reliability in Sub-100nm technologies. 2008 15th IEEE International Conference on Electronics, Circuits and Systems. doi:10.1109/icecs.2008.4674921Saiz-Adalid, L.-J., Gracia-Moran, J., Gil-Tomas, D., Baraza-Calvo, J.-C., & Gil-Vicente, P.-J. (2019). Ultrafast Codes for Multiple Adjacent Error Correction and Double Error Detection. IEEE Access, 7, 151131-151143. doi:10.1109/access.2019.2947315Saiz-Adalid, L.-J., Gil-Vicente, P.-J., Ruiz-García, J.-C., Gil-Tomás, D., Baraza, J.-C., & Gracia-Morán, J. (2013). Flexible Unequal Error Control Codes with Selectable Error Detection and Correction Levels. Computer Safety, Reliability, and Security, 178-189. doi:10.1007/978-3-642-40793-2_17Saiz-Adalid, L.-J., Reviriego, P., Gil, P., Pontarelli, S., & Maestro, J. A. (2015). MCU Tolerance in SRAMs Through Low-Redundancy Triple Adjacent Error Correction. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 23(10), 2332-2336. doi:10.1109/tvlsi.2014.2357476Gracia-Moran, J., Saiz-Adalid, L. J., Gil-Tomas, D., & Gil-Vicente, P. J. (2018). Improving Error Correction Codes for Multiple-Cell Upsets in Space Applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 26(10), 2132-2142. doi:10.1109/tvlsi.2018.2837220Cadence: Computational Software for Intelligent System Designhttps://www.cadence.comStine, J. E., Castellanos, I., Wood, M., Henson, J., Love, F., Davis, W. R., … Jenkal, R. (2007). FreePDK: An Open-Source Variation-Aware Design Kit. 2007 IEEE International Conference on Microelectronic Systems Education (MSE’07). doi:10.1109/mse.2007.44NanGate FreePDK45 Open Cell Libraryhttp://www.nangate.com/?page_id=232

    Proposal of an Adaptive Fault Tolerance Mechanism to Tolerate Intermittent Faults in RAM

    Full text link
    [EN] Due to transistor shrinking, intermittent faults are a major concern in current digital systems. This work presents an adaptive fault tolerance mechanism based on error correction codes (ECC), able to modify its behavior when the error conditions change without increasing the redundancy. As a case example, we have designed a mechanism that can detect intermittent faults and swap from an initial generic ECC to a specific ECC capable of tolerating one intermittent fault. We have inserted the mechanism in the memory system of a 32-bit RISC processor and validated it by using VHDL simulation-based fault injection. We have used two (39, 32) codes: a single error correction-double error detection (SEC-DED) and a code developed by our research group, called EPB3932, capable of correcting single errors and double and triple adjacent errors that include a bit previously tagged as error-prone. The results of injecting transient, intermittent, and combinations of intermittent and transient faults show that the proposed mechanism works properly. As an example, the percentage of failures and latent errors is 0% when injecting a triple adjacent fault after an intermittent stuck-at fault. We have synthesized the adaptive fault tolerance mechanism proposed in two types of FPGAs: non-reconfigurable and partially reconfigurable. In both cases, the overhead introduced is affordable in terms of hardware, time and power consumption.This research was supported in part by the Spanish Government, project TIN2016-81,075-R, and by Primeros Proyectos de Investigacion (PAID-06-18), Vicerrectorado de Investigacion, Innovacion y Transferencia de la Universitat Politecnica de Valencia (UPV), project 20190032.Baraza Calvo, JC.; Gracia-Morán, J.; Saiz-Adalid, L.; Gil Tomás, DA.; Gil, P. (2020). Proposal of an Adaptive Fault Tolerance Mechanism to Tolerate Intermittent Faults in RAM. Electronics. 9(12):1-30. https://doi.org/10.3390/electronics9122074S130912International Technology Roadmap for Semiconductors (ITRS)http://www.itrs2.net/2013-itrs.htmlJeng, S.-L., Lu, J.-C., & Wang, K. (2007). A Review of Reliability Research on Nanotechnology. IEEE Transactions on Reliability, 56(3), 401-410. doi:10.1109/tr.2007.903188Ibe, E., Taniguchi, H., Yahagi, Y., Shimbo, K., & Toba, T. (2010). Impact of Scaling on Neutron-Induced Soft Error in SRAMs From a 250 nm to a 22 nm Design Rule. IEEE Transactions on Electron Devices, 57(7), 1527-1538. doi:10.1109/ted.2010.2047907Boussif, A., Ghazel, M., & Basilio, J. C. (2020). Intermittent fault diagnosability of discrete event systems: an overview of automaton-based approaches. Discrete Event Dynamic Systems, 31(1), 59-102. doi:10.1007/s10626-020-00324-yConstantinescu, C. (2003). Trends and challenges in VLSI circuit reliability. IEEE Micro, 23(4), 14-19. doi:10.1109/mm.2003.1225959Bondavalli, A., Chiaradonna, S., Di Giandomenico, F., & Grandoni, F. (2000). Threshold-based mechanisms to discriminate transient from intermittent faults. IEEE Transactions on Computers, 49(3), 230-245. doi:10.1109/12.841127Contant, O., Lafortune, S., & Teneketzis, D. (2004). Diagnosis of Intermittent Faults. Discrete Event Dynamic Systems, 14(2), 171-202. doi:10.1023/b:disc.0000018570.20941.d2Sorensen, B. A., Kelly, G., Sajecki, A., & Sorensen, P. W. (s. f.). An analyzer for detecting intermittent faults in electronic devices. Proceedings of AUTOTESTCON ’94. doi:10.1109/autest.1994.381590Gracia-Moran, J., Gil-Tomas, D., Saiz-Adalid, L. J., Baraza, J. C., & Gil-Vicente, P. J. (2010). Experimental validation of a fault tolerant microcomputer system against intermittent faults. 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN). doi:10.1109/dsn.2010.5544288Fujiwara, E. (2005). Code Design for Dependable Systems. doi:10.1002/0471792748Hamming, R. W. (1950). Error Detecting and Error Correcting Codes. Bell System Technical Journal, 29(2), 147-160. doi:10.1002/j.1538-7305.1950.tb00463.xSaiz-Adalid, L.-J., Gil-Vicente, P.-J., Ruiz-García, J.-C., Gil-Tomás, D., Baraza, J.-C., & Gracia-Morán, J. (2013). Flexible Unequal Error Control Codes with Selectable Error Detection and Correction Levels. Computer Safety, Reliability, and Security, 178-189. doi:10.1007/978-3-642-40793-2_17Frei, R., McWilliam, R., Derrick, B., Purvis, A., Tiwari, A., & Di Marzo Serugendo, G. (2013). Self-healing and self-repairing technologies. The International Journal of Advanced Manufacturing Technology, 69(5-8), 1033-1061. doi:10.1007/s00170-013-5070-2Maiz, J., Hareland, S., Zhang, K., & Armstrong, P. (s. f.). Characterization of multi-bit soft error events in advanced SRAMs. IEEE International Electron Devices Meeting 2003. doi:10.1109/iedm.2003.1269335Schroeder, B., Pinheiro, E., & Weber, W.-D. (2011). DRAM errors in the wild. Communications of the ACM, 54(2), 100-107. doi:10.1145/1897816.1897844BanaiyanMofrad, A., Ebrahimi, M., Oboril, F., Tahoori, M. B., & Dutt, N. (2015). Protecting caches against multi-bit errors using embedded erasure coding. 2015 20th IEEE European Test Symposium (ETS). doi:10.1109/ets.2015.7138735Kim, J., Sullivan, M., Lym, S., & Erez, M. (2016). All-Inclusive ECC: Thorough End-to-End Protection for Reliable Computer Memory. 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). doi:10.1109/isca.2016.60Hwang, A. A., Stefanovici, I. A., & Schroeder, B. (2012). Cosmic rays don’t strike twice. ACM SIGPLAN Notices, 47(4), 111-122. doi:10.1145/2248487.2150989Gil-Tomás, D., Gracia-Morán, J., Baraza-Calvo, J.-C., Saiz-Adalid, L.-J., & Gil-Vicente, P.-J. (2012). Studying the effects of intermittent faults on a microcontroller. Microelectronics Reliability, 52(11), 2837-2846. doi:10.1016/j.microrel.2012.06.004Plasma CPU Modelhttps://opencores.org/projects/plasmaArlat, J., Aguera, M., Amat, L., Crouzet, Y., Fabre, J.-C., Laprie, J.-C., … Powell, D. (1990). Fault injection for dependability validation: a methodology and some applications. IEEE Transactions on Software Engineering, 16(2), 166-182. doi:10.1109/32.44380Gil-Tomas, D., Gracia-Moran, J., Baraza-Calvo, J.-C., Saiz-Adalid, L.-J., & Gil-Vicente, P.-J. (2012). Analyzing the Impact of Intermittent Faults on Microprocessors Applying Fault Injection. IEEE Design & Test of Computers, 29(6), 66-73. doi:10.1109/mdt.2011.2179514Rashid, L., Pattabiraman, K., & Gopalakrishnan, S. (2010). Modeling the Propagation of Intermittent Hardware Faults in Programs. 2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing. doi:10.1109/prdc.2010.52Amiri, M., Siddiqui, F. M., Kelly, C., Woods, R., Rafferty, K., & Bardak, B. (2016). FPGA-Based Soft-Core Processors for Image Processing Applications. Journal of Signal Processing Systems, 87(1), 139-156. doi:10.1007/s11265-016-1185-7Hailesellasie, M., Hasan, S. R., & Mohamed, O. A. (2019). MulMapper: Towards an Automated FPGA-Based CNN Processor Generator Based on a Dynamic Design Space Exploration. 2019 IEEE International Symposium on Circuits and Systems (ISCAS). doi:10.1109/iscas.2019.8702589Mittal, S. (2018). A survey of FPGA-based accelerators for convolutional neural networks. Neural Computing and Applications, 32(4), 1109-1139. doi:10.1007/s00521-018-3761-1Intel Completes Acquisition of Alterahttps://newsroom.intel.com/news-releases/intel-completes-acquisition-of-altera/#gs.mi6ujuAMD to Acquire Xilinx, Creating the Industry’s High Performance Computing Leaderhttps://www.amd.com/en/press-releases/2020-10-27-amd-to-acquire-xilinx-creating-the-industry-s-high-performance-computingKim, K. H., & Lawrence, T. F. (s. f.). Adaptive fault tolerance: issues and approaches. [1990] Proceedings. Second IEEE Workshop on Future Trends of Distributed Computing Systems. doi:10.1109/ftdcs.1990.138292Gonzalez, O., Shrikumar, H., Stankovic, J. A., & Ramamritham, K. (s. f.). Adaptive fault tolerance and graceful degradation under dynamic hard real-time scheduling. Proceedings Real-Time Systems Symposium. doi:10.1109/real.1997.641271Jacobs, A., George, A. D., & Cieslewski, G. (2009). Reconfigurable fault tolerance: A framework for environmentally adaptive fault mitigation in space. 2009 International Conference on Field Programmable Logic and Applications. doi:10.1109/fpl.2009.5272313Shin, D., Park, J., Park, J., Paul, S., & Bhunia, S. (2017). Adaptive ECC for Tailored Protection of Nanoscale Memory. IEEE Design & Test, 34(6), 84-93. doi:10.1109/mdat.2016.2615844Silva, F., Muniz, A., Silveira, J., & Marcon, C. (2020). CLC-A: An Adaptive Implementation of the Column Line Code (CLC) ECC. 2020 33rd Symposium on Integrated Circuits and Systems Design (SBCCI). doi:10.1109/sbcci50935.2020.9189901Mukherjee, S. S., Emer, J., Fossum, T., & Reinhardt, S. K. (s. f.). Cache scrubbing in microprocessors: myth or necessity? 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings. doi:10.1109/prdc.2004.1276550Saleh, A. M., Serrano, J. J., & Patel, J. H. (1990). Reliability of scrubbing recovery-techniques for memory systems. IEEE Transactions on Reliability, 39(1), 114-122. doi:10.1109/24.52622X9SRA User’s Manual (Rev. 1.1)https://www.manualshelf.com/manual/supermicro/x9sra/user-s-manual-1-1.htmlChishti, Z., Alameldeen, A. R., Wilkerson, C., Wu, W., & Lu, S.-L. (2009). Improving cache lifetime reliability at ultra-low voltages. Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture - Micro-42. doi:10.1145/1669112.1669126Datta, R., & Touba, N. A. (2011). Designing a fast and adaptive error correction scheme for increasing the lifetime of phase change memories. 29th VLSI Test Symposium. doi:10.1109/vts.2011.5783773Kim, J., Lim, J., Cho, W., Shin, K.-S., Kim, H., & Lee, H.-J. (2016). Adaptive Memory Controller for High-performance Multi-channel Memory. JSTS:Journal of Semiconductor Technology and Science, 16(6), 808-816. doi:10.5573/jsts.2016.16.6.808Yuan, L., Liu, H., Jia, P., & Yang, Y. (2015). Reliability-Based ECC System for Adaptive Protection of NAND Flash Memories. 2015 Fifth International Conference on Communication Systems and Network Technologies. doi:10.1109/csnt.2015.23Zhou, Y., Wu, F., Lu, Z., He, X., Huang, P., & Xie, C. (2019). SCORE. ACM Transactions on Architecture and Code Optimization, 15(4), 1-25. doi:10.1145/3291052Lu, S.-K., Li, H.-P., & Miyase, K. (2018). Adaptive ECC Techniques for Reliability and Yield Enhancement of Phase Change Memory. 2018 IEEE 24th International Symposium on On-Line Testing And Robust System Design (IOLTS). doi:10.1109/iolts.2018.8474118Chen, J., Andjelkovic, M., Simevski, A., Li, Y., Skoncej, P., & Krstic, M. (2019). Design of SRAM-Based Low-Cost SEU Monitor for Self-Adaptive Multiprocessing Systems. 2019 22nd Euromicro Conference on Digital System Design (DSD). doi:10.1109/dsd.2019.00080Wang, X., Jiang, L., & Chakrabarty, K. (2020). LSTM-based Analysis of Temporally- and Spatially-Correlated Signatures for Intermittent Fault Detection. 2020 IEEE 38th VLSI Test Symposium (VTS). doi:10.1109/vts48691.2020.9107600Ebrahimi, H., & G. Kerkhoff, H. (2018). Intermittent Resistance Fault Detection at Board Level. 2018 IEEE 21st International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS). doi:10.1109/ddecs.2018.00031Ebrahimi, H., & Kerkhoff, H. G. (2020). A New Monitor Insertion Algorithm for Intermittent Fault Detection. 2020 IEEE European Test Symposium (ETS). doi:10.1109/ets48528.2020.9131563Hsiao, M. Y. (1970). A Class of Optimal Minimum Odd-weight-column SEC-DED Codes. IBM Journal of Research and Development, 14(4), 395-401. doi:10.1147/rd.144.0395Benso, A., & Prinetto, P. (Eds.). (2004). Fault Injection Techniques and Tools for Embedded Systems Reliability Evaluation. Frontiers in Electronic Testing. doi:10.1007/b105828Gracia, J., Saiz, L. J., Baraza, J. C., Gil, D., & Gil, P. J. (2008). Analysis of the influence of intermittent faults in a microcontroller. 2008 11th IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems. doi:10.1109/ddecs.2008.4538761ZC702 Evaluation Board for the Zynq-7000 XC7Z020 SoChttps://www.xilinx.com/support/documentation/boards_and_kits/zc702_zvik/ug850-zc702-eval-bd.pd

    Ultrafast Codes for Multiple Adjacent Error Correction and Double Error Detection

    Full text link
    (c) 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.[EN] Reliable computer systems employ error control codes (ECCs) to protect information from errors. For example, memories are frequently protected using single error correction-double error detection (SEC-DED) codes. ECCs are traditionally designed to minimize the number of redundant bits, as they are added to each word in the whole memory. Nevertheless, using an ECC introduces encoding and decoding latencies, silicon area usage and power consumption. In other computer units, these parameters should be optimized, and redundancy would be less important. For example, protecting registers against errors remains a major concern for deep sub-micron systems due to technology scaling. In this case, an important requirement for register protection is to keep encoding and decoding latencies as short as possible. Ultrafast error control codes achieve very low delays, independently of the word length, increasing the redundancy. This paper summarizes previous works on Ultrafast codes (SEC and SEC-DED), and proposes new codes combining double error detection and adjacent error correction. We have implemented, synthesized and compared different Ultrafast codes with other state-of-the-art fast codes. The results show the validity of the approach, achieving low latencies and a good balance with silicon area and power consumption.This work was supported in part by the Spanish Government under Project TIN2016-81075-R, and in part by the Primeros Proyectos de Investigacion, Vicerrectorado de Investigacion, Innovacion y Transferencia de la Universitat Politecnica de Valencia (UPV), Valencia, Spain, under Project PAID-06-18 20190032.Saiz-Adalid, L.; Gracia-Morán, J.; Gil Tomás, DA.; Baraza Calvo, JC.; Gil, P. (2019). Ultrafast Codes for Multiple Adjacent Error Correction and Double Error Detection. IEEE Access. 7:151131-151143. https://doi.org/10.1109/ACCESS.2019.2947315S151131151143

    Enhancement of fault injection techniques based on the modification of VHDL code

    Full text link
    Deep submicrometer devices are expected to be increasingly sensitive to physical faults. For this reason, fault-tolerance mechanisms are more and more required in VLSI circuits. So, validating their dependability is a prior concern in the design process. Fault injection techniques based on the use of hardware description languages offer important advantages with regard to other techniques. First, as this type of techniques can be applied during the design phase of the system, they permit reducing the time-to-market. Second, they present high controllability and reachability. Among the different techniques, those based on the use of saboteurs and mutants are especially attractive due to their high fault modeling capability. However, implementing automatically these techniques in a fault injection tool is difficult. Especially complex are the insertion of saboteurs and the generation of mutants. In this paper, we present new proposals to implement saboteurs and mutants for models in VHDL which are easy-to-automate, and whose philosophy can be generalized to other hardware description languages.Baraza Calvo, JC.; Gracia-Morán, J.; Blanc Clavero, S.; Gil Tomás, DA.; Gil Vicente, PJ. (2008). Enhancement of fault injection techniques based on the modification of VHDL code. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 16(6):693-706. doi:10.1109/TVLSI.2008.2000254S69370616

    Injecting Intermittent Faults for the Dependability Assessment of a Fault-Tolerant Microcomputer System

    Full text link
    © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.As scaling is more and more aggressive, intermittent faults are increasing their importance in current deep submicron complementary metal-oxide-semiconductor (CMOS) technologies. This work shows the dependability assessment of a fault-tol- erant computer system against intermittent faults. The applied methodology lies in VHDL-based fault injection, which allows the assessment in early design phases, together with a high level of observability and controllability. The evaluated system is a duplex microcontroller system with cold stand-by sparing. A wide set of intermittent fault models have been injected, and from the simulation traces, coverages and latencies have been measured. Markov models for this system have been generated and some dependability functions, such as reliability and safety, have been calculated. From these results, some enhancements of detection and recovery mechanisms have been suggested. The methodology presented is general to any fault-tolerant computer system.This work was supported in part by the Universitat Politecnica de Valencia under the Research Project SP20120806, and in part by the Spanish Government under the Research Project TIN2012-38308-C02-01. Associate Editor: J. Shortle.Gil Tomás, DA.; Gracia Morán, J.; Baraza Calvo, JC.; Saiz Adalid, LJ.; Gil Vicente, PJ. (2016). Injecting Intermittent Faults for the Dependability Assessment of a Fault-Tolerant Microcomputer System. IEEE Transactions on Reliability. 65(2):648-661. https://doi.org/10.1109/TR.2015.2484058S64866165

    Design, Implementation and Evaluation of a Low Redundant Error Correction Code

    Full text link
    [EN] The continuous raise in the integration scale of CMOS technology has provoked an augment in the fault rate. Particularly, computer memory is affected by Single Cell Upsets (SCU) and Multiple Cell Upsets (MCU). A common method to tolerate errors in this element is the use of Error Correction Codes (ECC). The addition of an ECC introduces a series of overheads: silicon area, power consumption and delay overheads of encoding and decoding circuits, as well as several extra bits added to allow detecting and/or correcting errors. ECC can be designed with different parameters in mind: low redundancy, low delay, error coverage, etc. The idea of this paper is to study the effects produced when adding an ECC to a microprocessor with respect to overheads. Usually, ECC with different characteristics are continuously proposed. However, a great quantity of these proposals only present the ECC, not showing its behavior when using them in a microprocessor. In this work, we present the design of an ECC whose main characteristic is a low number of code bits (low redundancy). Then, we study the overhead this ECC introduces. Firstly, we show a study of silicon area, delay and power consumption of encoder and decoder circuits, and secondly, how the addition of this ECC affects to a RISC microprocessor.© 2021 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Gracia-Morán, J.; Saiz-Adalid, L.; Baraza-Calvo, J.; Gil Tomás, DA.; Gil, P. (2021). Design, Implementation and Evaluation of a Low Redundant Error Correction Code. IEEE Latin America Transactions. 19(11):1903-1911. https://doi.org/10.1109/TLA.2021.947562419031911191

    Contribución a la validación de sistemas complejos tolerantes a fallos en la fase de diseño. Nuevos modelos de fallos y técnicas de inyección de fallos

    Full text link
    En el diseño de sistemas informáticos (y en particular, de aquéllos en los que, por las características del servicio que prestan, un mal funcionamiento puede provocar pérdida de vidas humanas, perjuicio económico, suspensión de servicios primordiales, etc.), se establece como prioridad esencial conseguir que funcionen correctamente durante el mayor tiempo posible y con un elevado nivel de eficacia. Los sistemas que regulan servicios críticos disponen de unos mecanismos especiales que les proporcionan una cierta inmunidad a la ocurrencia de averías que puedan causar un cese o deterioro del servicio prestado. Por ello, se les denomina Sistemas Tolerantes a Fallos, o STF. Se define el concepto de Confiabilidad como un conjunto de funciones (o atributos) que permiten cuantificar la calidad del servicio prestado en cuanto a averías producidas, y en consecuencia, el grado de confianza que el usuario puede depositar en el sistema. Al desarrollar cualquier sistema tolerante a fallos es preciso validarlo, o lo que es lo mismo, cuantificar sus parámetros de Confiabilidad. Entre los numerosos métodos y técnicas existentes para validar sistemas tolerantes a fallos, esta tesis se ha centrado en un método de validación experimental: las técnicas de inyección de fallos basadas en la simulación de modelos en VHDL. Las principales ventajas de este conjunto de técnicas son que se pueden aplicar en la fase de diseño del sistema y que permiten acceder a cualquier elemento del modelo del sistema. Por contra, presentan el inconveniente de que, sobre todo en modelos de sistemas complejos, la inyección de los fallos supone un elevado coste temporal. Sin embargo, sus importantes ventajas las hacen lo suficientemente atractivas como para ser utilizadas al menos como técnica complementaria de otras más utilizadas por su bajo coste y sencillez de implementación, como SWIFI (software implemented fault injection). Un aspecto muy importante de las técnicas de inyección de fallos mediante simulaciBaraza Calvo, JC. (2003). Contribución a la validación de sistemas complejos tolerantes a fallos en la fase de diseño. Nuevos modelos de fallos y técnicas de inyección de fallos [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/2345Palanci

    On-line Vocational Training for Computer-Based Safety and Security using Competence- and Work-Based Learning.

    Full text link
    [EN] Nowadays, computer systems are present in almost all areas of life. However, it is very difficult to guarantee a determined Safety and Security level. Weak or incorrectly deployed Safety and Security policies may lead to unaffordable economical, human or reputation loses. However, current undergraduate programs rarely address Safety and Security in depth, and the definition of specific competences required for Safety and Security engineers is difficult, since actual and future necessities of ICT professionals should be considered. Work-Based Learning (WBL) can be used for professionals’ vocational training, as it promotes the acquisition of knowledge, as well as the development of skills. However, training professionals implies working with learners with high potential mobility, limited amount of time and heterogeneous prior learning. So, an on-line approach should be used. But, how the competences acquired by professionals can be recognized across Europe? The European Credit for Vocational Education and Training (ECVET) framework eases the transfer and recognition of the resulting competences among different countries and educational contexts. In this work, the RISKY project is presented. Funded by the European Union through the Leonardo da Vinci programme, this project tries to improve Safety and Security professionals’ training. To do this, it will use an on-line approach, based on competences and WBL.[ES] Actualmente, los sistemas informáticos están presentes en casi todos los ámbitos de la vida. Sin embargo, es muy difícil garantizar un determinado nivel de Seguridad e Inocuidad. Políticas débiles o mal implementadas pueden conducir a enormes pérdidas económicas, de vidas humanas o de reputación. Sin embargo, dichos aspectos no son abordados en profundidad durante la formación universitaria. Además, la definición de las competencias específicas en Seguridad e Inocuidad requeridas por los ingenieros en TIC es muy difícil, ya que se deben considerar sus necesidades actuales y futuras. El aprendizaje basado en el trabajo (del inglés Work-Based Learning o WBL) es una técnica muy útil para la formación de profesionales, ya que promueve la adquisición de conocimiento, así como el de sarrollo de habilidades. Pero la formación de profesionales implica trabajar con aprendices con una gran movilidad potencial, una cantidad limitada de tiempo y unos conocimientos previos bastante heterogéneos. Así, pues, se debe usar una aproximación on-line. Pero, ¿cómo pueden ser reconocidas las competencias adquiridas por un profesional en toda Europa? El entorno de trabajo European Credit for Vocational Education and Training (ECVET) facilita la transferencia y el reconocimiento de las competencias adquiridas en diferentes países y contextos educacionales. En este trabajo se presenta el proyecto RISKY. Financiado por la Unión Europea a través del programa Leonardo da Vinci, este proyecto pretende mejorar la formación de profesionales en Seguridad e Inocuidad. Para ello, utilizará un enfoque on-line, basado en competencias y WBL.This work has been funded by the RISKY Leonardo da Vinci project (#2011-1-FR1-LEO05-24482) from the European Commission. This publication reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein.Gracia-Morán, J.; Ruiz García, JC.; Andrés Martínez, DD.; Baraza Calvo, JC.; Gil Vicente, PJ. (2014). Formación on-line para profesionales en el campo de la Seguridad y la Inocuidad Informática a través de Competencias y Aprendizaje Basado en el Trabajo. Revista de Formaciónn e Innovación Educativa Universitaria. 7(3):143-154. http://hdl.handle.net/10251/46460S1431547

    Analyzing the impact of intermittent faults on microprocessors applying fault injection

    Full text link
    Intermittent faults, being serious concerns for deep-submicron integrated circuits, are not well studied in the literature. This paper performs fault injection simulation to analyze the impact of intermittent faults, which is an important step towards the development of mitigation techniques for such threats.This work was supported by the Spanish Government under the project TIN2009-13825.Gil Tomás, DA.; Gracia-Morán, J.; Baraza Calvo, JC.; Saiz-Adalid, L.; Gil, P. (2012). Analyzing the impact of intermittent faults on microprocessors applying fault injection. IEEE Design and Test of Computers. 29(6):66-73. https://doi.org/10.1109/MDT.2011.2179514S667329
    corecore